Model Selection

Large-scale Corpus

# Large-scale Corpus

Randeng Pegasus 523M Summary Chinese V1

A Chinese PEGASUS-large model specialized in text summarization tasks, fine-tuned on multiple Chinese summarization datasets

Text Generation

Transformers Chinese

Ernie 3.0 Mini Zh

ERNIE 3.0 is a large-scale knowledge-enhanced pre-trained model for Chinese language understanding and generation, with the mini version being its lightweight implementation.

Large Language Model

Transformers Chinese

BERT-large variant pretrained on large-scale scientific paper collections with 340 million parameters, specializing in scientific literature comprehension

Large Language Model

Transformers English

ProcBERT is a pre-trained language model specifically optimized for process texts, pre-trained on a large-scale corpus of process texts (including biomedical literature, chemical patents, and cooking recipes), demonstrating outstanding performance in downstream tasks.

Large Language Model

Transformers English

Indobert Large P2

IndoBERT is a state-of-the-art language model developed for Indonesian based on the BERT architecture, trained using Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) objectives.

Large Language Model Other

Electra Base Gc4 64k 500000 Cased Generator

A massive German language model trained on the cleaned German Common Crawl corpus (GC4), totaling approximately 844GB, which may contain biases.

Large Language Model

Transformers German

Wav2vec2 Base Nl Voxpopuli

A Wav2Vec2 base model pretrained on the Dutch subset of the VoxPopuli corpus, suitable for Dutch speech recognition tasks.

Speech Recognition

Transformers Other

Bert Large Arabertv2

AraBERT is a pre-trained language model based on Google's BERT architecture, specifically designed for Arabic natural language understanding tasks.

Large Language Model Arabic

Chinese Mobile Bert

This model was pre-trained on a 250-million-word Chinese corpus using the MobileBERT architecture, with a training period of 15 days, completing 1 million iterations on a single A100 GPU.

Large Language Model

Xlm Roberta Large

XLM-RoBERTa is a multilingual model pretrained on 2.5TB of filtered CommonCrawl data across 100 languages, trained with a masked language modeling objective.

Large Language Model Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase